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ABSTRACT 



In accordance with the invention, a method and system for 
accessing a dialog system employing a plurality of different 
clients, includes providing a first client device for accessing 
a conversational system and presenting a command to the 
conversational system by converting the command to a form 
understandable to the conversational system. The command 
is interpreted by employing a mediator, a dialog manager 
and a multi-modal history to determine the intent of the 
command based on a context of the command. A second 
client device is determined based on a predetermined device 
preference stored in the conversational system. An applica- 
tion is abstracted to perform the command, and the results of 
the performance of the command are set to the second client 
device. 

29 Claims, 5 Drawing Sheets 
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METHOD AND SYSTEM FOR 
MULTI-CLIENT ACCESS TO A DIALOG 
SYSTEM 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to conversational computer 
systems, and more particularly to conversational systems 
with automatic speech recognition, natural language under- 
standing and dialog management for multi-client access. 

2. Description of the Related Art State-of-the-art conver- 
sational systems such as those described in Lamel et al., 
"The LIMSI ARISE System for Train Travel Information/' 
International Conference on Acoustics, Speech and Signal 
Processing, Phoenix, Arizona, March 1999 and Ward et al., 
"Towards Speech Understanding Across Multiple 
Languages," International Conference on Spoken Language 
Processing, Sydney, Australia, December 1998, have 
focused on a single access method (limited to either a 
desktop or a telephone). As more and more information is 
available in electronic form, with the information interaction 
becoming increasingly complex, it is desirable to provide 
access to information using the most natural and efficient 
interfaces. In particular, it is desirable to provide efficient 
interfaces with several devices (such as desktops, telephones 
and personal digital assistants (PDAs)) that can potentially 
be used to access information and to design interfaces that 
are similar and intuitive across a wide range of access 
methods and input/output modalities. However, such sys- 
tems pose a design challenge due to the complexity needed 
to realize such a design. 

Therefore, a need exists for a system and method for a 
multi-client access to a dialog system. A further need exists 
for a multi-client access system which provides an efficient 
platform for natural speech interaction. 

SUMMARY OF THE INVENTION 
In accordance with the invention, a method, which may be 
implemented by a program storage device readable by 
machine, tangibly embodying a program of instructions 
executable by the machine, for accessing a dialog system 
employing a plurality of different clients, includes providing 
a first client device for accessing a conversational system 
and presenting a command to the conversational system by 
converting the command to a form understandable to the 
conversational system. The command is interpreted by 
employing a dialog manager and a multi-modal history to 
determine the intent of the command based on a context of 
the command (and possibly a type of device employed to 
present the command). A second client device is determined 
based on a predetermined device preference stored in the 
conversational system. An application is abstracted to per- 
form the command, and the results of the performance of the 
command are sent to the second client device. 

In other methods, which may be implemented by a 
program storage device, the first client device may the same 
as the second client device or the second client device may 
be a plurality of client devices. The first client device and the 
second client device may include at least one of a telephone, 
a computer, a personal digital assistant or equivalent 
devices. The command may be presented in natural speech, 
the steps of recognizing the speech and converting the 
speech to a formal language may be included. The steps of 
outputting the results of the performance of the command to 
the second client device by speech synthesis may also be 
included. The command may be presented graphically, and 
the step of responding to the command by one or both of a 
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graphical result and synthesized speech may be included. 
The step of providing a component abstraction interface to 
interface with applications such that the conversational 
system is shielded from details of execution of the applica- 

5 tion may be included. The step of querying a user via the first 
client device for information about the device preference 
and/or clarification of command information may also be 
included. The mediator preferably employs information 
about the first client device and/or the second client device 

10 to determine the context of the command. 

A system for accessing a dialog system employing a 
plurality of different clients, in accordance with the 
invention, includes a device handling and abstraction system 
adapted to provide input and output interfacing to a plurality 

15 of different client devices. The device handling and abstrac- 
tion system receives commands from at least one client 
device and converts the commands to a form acceptable to 
a conversational system. The conversational system is 
coupled to the device handling and abstraction system for 

20 receiving converted commands. The conversational system 
is adapted to interpret the converted commands based on a 
context of the command (and possibly the device used to 
present the command) to determine an appropriate applica- 
tion responsive to the converted command. The conversa- 

25 tional system includes a device preference to which results 
of executing the converted commands are sent. An applica- 
tion abstraction system is coupled to the conversational 
system and is adapted for determining which applications 
are appropriate for executing the converted command. The 

30 application abstraction system is further adapted to interface 
with a plurality of applications and to shield the conversa- 
tional system from communications with the applications. 

In alternate embodiments, the plurality of different client 
devices may include at least one of a telephone, a computer, 
a personal digital assistant, or equivalents. A command may 
be input to a first client and a result may also be received by 
the first client. The commands may include graphical com- 
mands and speech commands. The results of executing the 
converted commands may be conveyed to a client device as 
one of or both of synthesized speech and graphics. The 
device preference may include a plurality of client devices. 
The converted commands may include a formal language 
converted from natural speech. The conversational system 
may include a dialog manager and a multi-modal history to 

45 determine the intent of the converted commands based on a 
context of the commands. The conversational system pref- 
erably includes a mediator which employs information about 
at least one of the plurality of different client devices and a 
client device of the device preference to determine the 

50 context of the command. 

These and other objects, features and advantages of the 
present invention will become apparent from the following 
detailed description of illustrative embodiments thereof, 

S5 which is to be read in connection with the accompanying 
drawings. 

BRIEF DESCRIPTION OF DRAWINGS 

The invention will be described in detail in the following 
60 description of preferred embodiments with reference to the 
following figures wherein: 

FIG. 1 is a block diagram of an illustrative system 
architecture which supports multi-client access, in accor- 
dance with the present invention; 
65 FIG. 2 is a block/flow diagram of an example of a device 
handling and abstraction system, in accordance with the 
present invention; 
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FIG. 3 is a block/flow diagram of an example of a the spoken input. Methods for performing speech recogni- 

conversational system, in accordance with the present inven- tion are known in the art. The recognized text from speech 

tion; recognition system 202 is sent to the speech interface 204. 

FIG. 4 is a block/flow diagram of an example of an The speech interface 204 communicates with conversational 

application abstraction system and applications, in accor- * svstem 30 °. b y providing the recognized text and the iden- 

dance with the present invention; and Ut y of the chent device 100 10 the conversational system, 

CTr- « v ki i /a a- e i *i_ j c and by receiving the response from the conversational 

™in! . Aw « r f m 3 f yS T' m ^« , sy**™ 300 In one embodiment of the invention, the 

accesstng a dialog system employe a plurality of different r > from (he ^^tiontl system 300 to the speech 

clients, in accordance with the present invention. iha tu~ e~ . • t * ^ , u - j 

r jo interlace 2U4 is in the form a string of text to be converted 

DETAILED DESCRIPTION OF PREFERRED to s P eecn anc * pl ave d back to the user. The conversion from 

EMBODIMENTS text t0 s V&&h * s performed by speech synthesis system 203. 

Methods for text to speech conversion are known in the art. 

The present invention is directed to conversational sys- The present invention may be able to utilize speech recog- 
tems which can be accessed from multiple client devices, 15 nition system 202, speech synthesis system 203 and/or 
such as a desktop computer, a telephone, a personal digital speech interface 204 for different client devices 100 or 
assistant (PDA) or other client devices, for example a pager, multiple client devices of a same type, for example on a 
etc. The invention provides methods for building a conver- computer network or a communications network, 
sational system that supports access from multiple client Referring to FIG. 3, an illustrative example of the con- 
devices, while preserving the "system personality" and 20 versational system 300, according to the present invention, 
conversational context across the access methods. Further, at i s schematically shown. The conversational system 300 
the same time, the present invention customizes the presen- includes a natural language understanding (NLU) unit 301, 
tation of information for a particular client device. A con- a mediator 302, a dialog manager 303, a multi-modal history 
versational speech interface in accordance with the present 304 an d a device preference 305, The natural language 
invention can be accessed from a variety of client devices 25 understanding unit 301 translates the recognized text from 
and can form the basis for a seamless and pervasive interface tne speech recognition system 202 to a formal command 
for information interaction. This invention also supports corresponding to the user's intention. For example, in an 
multiple input modalities (conversational and graphical). electronic mail application, the user may say, "Please tell me 

It should be understood that the elements shown in FIGS. if I have any new messages," or the user may say, "do I have 

1-5 may be implemented in various forms of hardware, 30 any new mail", and in both cases, the user's input may be 

software or combinations thereof. Preferably, these elements translated into a formal command which may be of the form: 

are implemented in software on one or more appropriately check_new„mail( ). The formal command, along with the 

general purpose digital computers having a processor and identity of the client device 100 that was used to generate 

memory and input/output interfaces. Referring now to the input, is passed to mediator 302. 

drawings in which like numerals represent the same or 35 i n accordance with the invention, both the formal lan- 

similar elements and initially to FIG. 1, an illustrative guage statement and the identity of the input device are 

example of a system architecture for a system 101 which passed to the mediator 302. The mediator 302 decides on 

supports multi-client access, according to the present wna t decision network (or other element of dialog 

invention, is shown. A user or users connect to the system management, if decision networks are not used) based not 

101 via chent device 100, examples of which include a 40 only on the formal language statement, but also on the input 

computer, a telephone, a personal digital assistant (PDA), device 100 (the device that generated the input/command) 

etc. A device handling and abstraction subsystem 200 con- and the output device (specified in device preference 305). 

nects the client device 100 and to a conversational system The same formal language statement may result in a differ- 

300. Conversation system 300 is responsible for natural ent system behavior if the devices involved are different. For 

language understanding and dialog management, and con- 45 example, if a command includes "compose the body of the 

nects to an application abstraction system 400. The appli- message" from a desktop, the system will start dictation 

cation abstraction system 400 is responsible for communi- using speech recognition employed by the desktop com- 

cating with the application components included within pu ter. However, if a user says the same thing from a 

applications 500. Examples of applications may include telephone, and given that speech recognition accuracy is 

electronic mail applications, electronic calendar 50 rather poor from a telephone for large vocabulary 

applications, electronic address book applications, spread composition, the system may instead start audio recording 

sheet application, etc. Each component shown in FIG. 1 will an d send the message as an audio attachment. Similarly, the 

now be described in greater detail below, output device ( from device preference 305) may also deter- 

Referring to FIG. 2, device handling and abstraction mine the system behavior. For example, if the user says 

subsystem 200,. according to the present invention, is sche- 55 "what appointments do I have tomorrow", the calendar for 

matically shown. The device handling and abstraction sub- tomorrow may be displayed on the desktop, but on a 

system 200 handles the connection to the client device 100. telephone, the first few entries or a summary may be read 

For each type of client device, such as a desktop computer, out. Therefore, in accordance with the present invention, the 

laptop computer, telephone, PDA, etc., there is a client system behavior may depend on the devices involved. The 

input-output 201, speech recognition system 202, speech 60 mediator 302 communicates with the device preference 305, 

synthesis system 203 and speech interface 204. The client which includes the user's preference for the client device 

input-output 201 receives the input from the user, which may 100 to which the output should be presented. In many cases, 

either be a graphical input or a spoken input, or a combi- the output client device 100 would be the same as the input 

nation of both, depending on the capabilities of the client client device 100, but the user may specify a different client 

device 100. When the input is a spoken input, speech 65 device for the output when necessary. For example, the user 

recognition system 202 performs speech recognition on the may choose to generate the input using a telephone, which 

spoken input to generate recognized text corresponding to does not have a graphical display, and have the output sent 
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to a PDA, which does have a graphical display, or to another ing to the present invention, is schematically shown. The 

user's client device. The user may specify the desired output application abstraction system 400 includes a component 

client device using either a graphical input or a spoken input. control 401 and a set of component abstractions 402 corre- 

Using graphical input, the user may open a user preference sponding to each application component 501 in applications 

file in device preference 305 and specify the desired output 5 500. Examples of application components 501 are electronic 

client device. The output device may include a specific mail, electronic calendars, electronic address books, etc. The 

device or a plurality of devices which may be the same or component control 401 serves to create instances of and 

different types of devices. Using spoken input, the user may maintain references to the abstraction components 402. In 

say "send the output to my desktop", which may be trans- addition, the component control 401 functions as a "switch 

lated to a command of the form "select_device (output_ 1Q yard" for forwarding commands to the appropriate applica- 

device=desktop)", and the preference will be set. ti 0D component 501 and accumulating the responses. The 

In one embodiment of the invention, the dialog manager application components 501 shield the conversational sys- 

303 employs decision networks, as described in commonly tem 300 (FIGS. 1 and 3) from the details of applications 500, 

assigned U.S. application Ser. No. 09/374,744 entitled, allowing a very high level of communication, since the 

"METHOD AND SYSTEM FOR MODELESS OPERA- conversational system 300 does not need to have informa- 

™»niiru nc E ?JSIi5^S tion 00 how a S P ecific command » t0 * accomplished. The 

J^n^^^^^ °n ] f E S^ applications 500 may therefore be interchanged with no 

SSSS NETWORKS, Attorney Docket Y0999-277 cfa QCC ^ conversalional sy J m 30 0; only 

(8728-300), filed concurrently herewith and incorporated a, u . j * u u j 

herein by reference. A decision network is a recipes) for the n ab / tractl0n components 402 need to be changed, 

accomplishing a specific transaction. Other embodiments, 20 Referring to FIG. 5, a block/flow diagram is shown for a 

such as the embodiments described in U.S. application Ser. system/method for accessing a dialog system employing a 

No. 09/374,744 entitled, "METHOD AND SYSTEM FOR plurality of different clients in accordance with the inven- 

DETERMIN1NG AND MAINTAINING DIALOG FOCUS tion - The cliem devices may include at least one of a 

IN A CONVERSATIONAL SPEECH SYSTEM/' filed con- telephone, a computer, a personal digital assistant, etc. In 

currently herewith and incorporated herein by reference, 2 5 block 602, a first client device is provided for accessing a 

may also be used to build the dialog manager 303 and/or the conversational system. In block 604, a command is pre- 

multi- modal history 304. It is to be understood that the sented to the conversational system by converting the com- 

multi-modal history 304 captures all events, as well as the mand to a form understandable to the conversational system, 

devices 100 used to generate the events, and the modality for example converting a human utterance to a formal 

used. All of this information may be needed in some cases 30 language command. 

to resolve ambiguities, etc. and determine the context of In block 606, the command is interpreted by employing a 

input/commands. dialog manager and a multi-modal history to determine the 

Each combination of formal command and identity of the intent of the command based on a context of the command, 

input and output client devices 100 maps to one decision The context of the command may be based, at least in part, 

network. The mediator 302 determines the appropriate deci- 35 on the client devices used as an input device or as an output 

sion network to spawn, based on the formal command and device. In block 607, a user may be queried through the 

the identity of the input and output client devices. Once the client device to determine additional information, for 

appropriate decision network is spawned, the dialog man- example, information about the device preference or clari- 

ager 303 communicates with the application abstraction fication of the command information. In block 608, a second 

system 400 to accomplish the transaction represented by the 40 client device is determined based on a predetermined device 

formal command. Once the transaction is completed, the preference stored in the conversational system. The prede- 

response is formatted according to the capabilities of the termined device preference may include the same device, a 

output client device as specified in device preference 305, plurality of devices or a specific other client device. In block 

and subsequently sent to the output client device for pre- 610, an application is abstracted to perform the command, 

sentation to the user. Formatting of the response is necessary 45 and the results of the performance of the command are sent 

because different client devices will have different capabili- to the second client device. In block 612, the results of the 

ties. For example, if the user says "go to the next message," performance of the command are sent to the second client 

and the output client device is a desktop with a display, then device for speech synthesis, graphical representation, or 

the response may be to highlight the next message, with no both depending on the client devices involved, 

audio output. But, if the output client device is a telephone, 50 The present invention provides a conversational system 

with no display, then an audio output of form "message that supports access from multiple client devices using 

selected" may be played out on the output client device. multiple input modalities. Examples of client devices sup- 

The multi-modal history 304 captures all system events, ported may include desktop computers, telephones, and 

both spoken and graphical, from all client devices, and keeps personal digital assistants (PDAs). The invention describes 

track of the system state. The dialog manager 303 uses the 55 the overall architecture to support such a conversational 

multi-modal history 304 for disambiguation and reference system, including the innovations incorporated to preserve 

resolution. For example, if the user says "open that", then the personality and conversational context across multiple 

the dialog manager 303 will communicate with the multi- access methods, and the innovations incorporated to cus- 

modal history 304 to resolve which object was referred to by lomize the presentation of information for individual client 

"that". Since all system events are recorded, the user may 60 devices. 

mix different input modalities (spoken or graphical) on the Having described preferred embodiments of a method and 

same client device, or on different client devices. For system for multi-client access to a dialog system (which are 

example, the user may select a message using a mouse click intended to be illustrative and not limiting), it is noted that 

on the desktop, and later say "delete that" from a telephone, modifications and variations can be made by persons skilled 

and the selected message will be deleted. 65 i D the art in light of the above teachings. It is therefore to be 

Referring to FIG. 4, an illustrative example of the appli- understood that changes may be made in the particular 

cation abstraction system 400 and applications 500, accord- embodiments of the invention disclosed which are within the 
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scope and spirit of the invention as outlined by the appended 
claims. Having thus described the invention with the details 
and particularity required by the patent laws, what is claimed 
and desired protected by Letters Patent is set forth in the 
appended claims. 5 
What is claimed is: 

1. A method for accessing a dialog system employing a 
plurality of different clients comprising: 

providing a first client device for accessing a conversa- 
tional system; 3G 

presenting a command to the conversational system by 
converting the command to a form understandable to 
the conversational system; 

interpreting the command by employing a mediator, a 
dialog manager and a multi-modal history to determine 2 5 
the intent of the command based on a context of the 
command; 

determining a second client device based on a predeter- 
mined device preference; 

r 20 

abstracting an application to perform the command; and 
sending results of the performance of the command to the 
second client device. 

2. The method as recited in claim 1, wherein the first client 
device is the same as the second client device. 2 5 

3. The method as recited in claim 1, wherein the second 
client device is a plurality of client devices. 

4. The method as recited in claim 1, wherein the first client 
device and the second client device include at least one of a 
telephone, a computer and a personal digital assistant. 3Q 

5. The method as recited in claim 1, wherein the command 
is presented in natural speech, the method further compris- 
ing the steps of recognizing the speech and converting the 
speech to a formal language. 

6. The method as recited in claim 5, further comprising 
the steps of outputting the results of the performance of the 35 
command to the second client device by speech synthesis, 

7. The method as recited in claim 1, wherein the command 
is presented graphically, the method further comprising the 
steps of responding to the command by one or both of a 
graphical result and synthesized speech. 40 

8. The method as recited in claim 1, further comprising 
the step providing a component abstraction interface to 
interface with applications such that the conversational 
system is shielded from details of execution of the applica- 



tion. 



45 



50 



9. The method as recited in claim 1, further comprising 
the step querying a user via the first client device for one of 
information about the device preference and clarification of 
command information. 

10. The method as recited in claim 1, wherein the media- 
tor employs information about one of the first client device 
and the second client device to determine the context of the 
command. 

11. A program storage device readable by machine, tan- 
gibly embodying a program of instructions executable by the 
machine to perform method steps for accessing a dialog 55 
system employing a plurality of different clients, the method 
steps comprising: 

providing a first client device for accessing a conversa- 
tional system; 

presenting a command to the conversational system by 60 
converting the command to a form understandable to 
the conversational system; 

interpreting the command by employing a mediator, a 
dialog manager and a multi-modal history to determine 65 
the intent of the command based on a context of the 
command; 



determining a second client device based on a predeter- 
mined device preference; 
abstracting an application to perform the command; and 
sending results of the performance of the command to the 
second client device. 

12. The program storage device as recited in claim 11, 
wherein the first client device is the same as the second 
client device. 

13. The program storage device as recited in claim 11, 
wherein the second client device is a plurality of client 
devices. 

14. The program storage device as recited in claim 11, 
wherein the first client device and the second client device 
include at least one of a telephone, a computer and a 
personal digital assistant. 

15. The program storage device as recited in claim 11, 
wherein the command is presented in natural speech, the 
method further comprising the steps of recognizing the 
speech and converting the speech to a formal language. 

16. The program storage device as recited in claim 15, 
further comprising the steps of outputting the results of the 
performance of the command to the second client device by 
speech synthesis. 

17. The program storage device as recited in claim 11, 
wherein the command is presented graphically, the method 
comprising the steps of responding to the command by one 
or both of a graphical result and synthesized speech. 

18. The program storage device as recited in claim 11, 
further comprising the step providing a component abstrac- 
tion interface to interface with applications such that the 
conversational system is shielded from details of execution 
of the application. 

19. The program storage device as recited in claim 11, 
further comprising the step querying a user via the first client 
device for one of information about the device preference 
and clarification of command information. 

20. The program storage device as recited in claim 11, 
wherein the mediator employs information about one of the 
first client device and the second client device to determine 
the context of the command. 

21. A system for accessing a dialog system employing a 
plurality of different clients, comprising: 

a device handling and abstraction system adapted to 
provide input and output interfacing to a plurality of 
different client devices, the device handling and 
abstraction system for receiving commands from at 
least one client device and converting the commands to 
a form acceptable to a conversational system; 

the conversational system coupled to the device handling 
and abstraction system for receiving converted 
commands, the conversational system adapted to inter- 
pret the converted commands based on a context of the 
command to determine an appropriate application 
responsive to the converted command, the conversa- 
tional system including a device preference to which 
results of executing the converted commands are sent; 
and 

an application abstraction system coupled to the conver- 
sational system and adapted for determining which 
applications are appropriate for executing the con- 
verted command, the application abstraction system 
being adapted to interface with a plurality of applica- 
tions and to shield the conversational system from 
communications with the applications. 
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22. The system as recited in claim 21, wherein the 
plurality of different client devices include at least one of a 
telephone, a computer and a personal digital assistant. 

23. The system as recited in claim 21, wherein a command 

is input to a first client and a result is received by the first 5 
client. 

24. The system as recited in claim 21, wherein the 
commands include graphical commands and speech com- 
mands. 

25. The system as recited in claim 21, wherein the results 10 
of executing the converted commands are conveyed to a 
client device as one of or both of synthesized speech and 
graphics. 

26. The system as recited in claim 21, wherein the device 
preference includes a plurality of client devices. 
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27. The system as recited in claim 21, wherein the 
converted commands include a formal language converted 
from natural speech. 

28. The system as recited in claim 21, wherein the 
conversational system includes a dialog manager and a 
multimodal history to determine the intent of the converted 
commands based on a context of the commands. 

29. The system as recited in claim 21, wherein the 
conversational system includes a mediator which employs 
information about at least one of the plurality of different 
client devices and a client device of the device preference to 
determine the context of the command. 

* * * * * 
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